data challenge
Assessing Extrapolation of Peaks Over Thresholds with Martingale Testing
de Vilmarest, Joseph, Wintenberger, Olivier
We present the winning strategy for the EVA2025 Data Challenge, which aimed to estimate the probability of extreme precipitation events. These events occurred at most once in the dataset making the challenge fundamentally one of extrapolating extreme values. Given the scarcity of extreme events, we argue that a simple, robust modeling approach is essential. We adopt univariate models instead of multivariate ones and model Peaks Over Thresholds using Extreme Value Theory. Specifically, we fit an exponential distribution to model exceedances of the target variable above a high quantile (after seasonal adjustment). The novelty of our approach lies in using martingale testing to evaluate the extrapolation power of the procedure and to agnostically select the level of the high quantile. While this method has several limitations, we believe that framing extrapolation as a game opens the door to other agnostic approaches in Extreme Value Analysis.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for Vietnamese Abstractive Multi-document Summarization
Tran, Mai-Vu, Le, Hoang-Quynh, Can, Duy-Cat, Nguyen, Quoc-An
This paper reports the overview of the VLSP 2022 - Vietnamese abstractive multi-document summarization (Abmusu) shared task for Vietnamese News. This task is hosted at the 9$^{th}$ annual workshop on Vietnamese Language and Speech Processing (VLSP 2022). The goal of Abmusu shared task is to develop summarization systems that could create abstractive summaries automatically for a set of documents on a topic. The model input is multiple news documents on the same topic, and the corresponding output is a related abstractive summary. In the scope of Abmusu shared task, we only focus on Vietnamese news summarization and build a human-annotated dataset of 1,839 documents in 600 clusters, collected from Vietnamese news in 8 categories. Participated models are evaluated and ranked in terms of \texttt{ROUGE2-F1} score, the most typical evaluation metric for document summarization problem.
- Asia > Vietnam > Hanoi > Hanoi (0.05)
- North America > United States > Maryland > Baltimore (0.04)
- Research Report (1.00)
- Overview (1.00)
1-D Residual Convolutional Neural Network coupled with Data Augmentation and Regularization for the ICPHM 2023 Data Challenge
Kreuzer, Matthias, Kellermann, Walter
To avoid these Industrial machines are subjected to heavy stress drawbacks, we propose a residual Convolutive Neural conditions and are therefore susceptible to defects and Network (CNN) that stands out for its small number of resulting malfunctions. Even small defects can already parameters and additonally employ data augmentation have significant consequences as they can cause additional techniques to further mitigate the effect of a possible costs by, e.g., delaying industrial production over-fitting for the ICPHM 2023 Data Challenge on through unexpected downtime, or can even put human Industrial Systems' Health Monitoring using Vibration lives in danger, e.g., resulting from bearing failure in Signal Analysis [10].
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
How a Wildlife AI Platform Solved its Data Challenge - InformationWeek
Anyone working in data management and data science can attest to the challenge and time-consuming nature of mapping a set of data from a new source into a platform where it can be cleaned, validated, and ultimately analyzed and used to train algorithms. After all, your algorithms are only as good as the data used to train them. Now imagine if these data sets are coming from hundreds of external users who have employed any number of systems to collect this data, from Excel files to actual shoeboxes full of photos. That is the challenge that non-profit wildlife conservation machine learning and artificial intelligence service provider Wild Me has faced over its more than a decade of operation. The organization builds open software and AI for the conservation research community.
- North America > United States (0.05)
- North America > Mexico (0.05)
- Indian Ocean (0.05)
- (2 more...)
- Government > Military (0.35)
- Information Technology (0.30)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.40)
Potential IBM Watson Health Sale Puts Focus on Data Challenges
Even so, some experts found that it can be difficult to apply AI to treating complex medical conditions. Having access to data that represents patient populations broadly has been a challenge, experts told the Journal, and gaps in knowledge about complex diseases may not be fully captured in clinical databases. "I believe that we're many years away from AI products that really positively impact clinical care for many patients," said Bob Kocher, a partner at venture-capital firm Venrock who focuses on healthcare IT and services investments and who was a White House health adviser under President Barack Obama. Software that makes recommendations on personal medical treatments needs data on what actions have worked in the past. But data on medical histories and treatment outcomes aren't always complete, may be recorded in different formats, and may be sitting in various systems owned by insurance carriers, health providers and other organizations.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.57)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Case Based Reasoning (0.40)
COVID-19 Vaccine Distribution: Addressing Data Challenges
Distributing the COVID-19 vaccine is a logistical puzzle that teeters on a delicate structure of chemists, data scientists, freight drivers, healthcare professionals, distributors, state health departments, and policy makers. When even one of these pieces in the structure is imbalanced, the whole vaccine distribution tower could tumble. The U.S. Federal Drug Administration (FDA) has authorized two vaccinations based on data findings from extensive clinical trials and manufacturers, which have been deemed safe for distribution and use under Emergency Use Authorizations (EUA). However, supply limitations and other pressing challenges have compounded the logistical complexities and contributed to the slow rollout and incomplete shipment of doses. With a projected 600 million vaccination doses required in the U.S and demand currently outweighing supply, a vaccination effort of this scale comes with risks and challenges across end-to-end vaccine distribution management.
- Health & Medicine > Therapeutic Area > Vaccines (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Information Technology > Data Science (0.85)
- Information Technology > Artificial Intelligence (0.53)
- Information Technology > Architecture > Real Time Systems (0.35)
How To Tackle the Data Challenges of Pharmacovigilance?
Cognitive computing can transform the practice of pharmacovigilance, from a tedious, resource-intensive process to a dynamic and efficient method focusing on risk management. FREMONT, CA: As pharmacovigilance deals with the activities relating to the detection, understanding, assessment, and prevention of adverse effects of pharmaceutical products, it has to navigate through a large volume of complex data. It cannot be avoided for its complex nature because pharmacovigilance audit accesses the compliance of pharma companies with worldwide laws, regulations, and FDA guidance. There arises a demand for handling enormous data by remaining compliant with the changing regulations globally while maintaining and improving the information contained in the individual case safety reports. The cost of handling pharmacovigilance is increasing with the exponential growth of cases received by pharmaceutical companies. The technical advancement like cloud-based solutions, mobile health devices, artificial intelligence, blockchain, and machine learning can improve the effectiveness of PV and the efficacy of drugs.
Solving Data Challenges In Machine Learning With Automated Tools
Data is the lifeblood of machine learning (ML) projects. At the same time, the data preparation process is one of the main challenges that plague most projects. According to a recent study, data preparation tasks take more than 80% of the time spent on ML projects. Data scientists spend most of their time on data cleaning (25%), labeling (25%), augmentation (15%), aggregation (15%), and identification (5%). This article will talk about the most common data preparation challenges that require data scientists and machine learning engineers to spend so much time on data preparation. We'll also look at how self-service data preparation tools can help in overcoming these challenges.
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Quality > Data Cleaning (0.38)
Banks Ramp Up Machine Learning, Work Through Data Challenges - InformationWeek
The use of machine learning is ramping up across many industries, and the field of finance is no exception. The second annual survey and report from the Institute of International Finance provided some insight into the trajectory of the technology in that application of finance, as well as a sense of some of the obstacles encountered by organizations looking to deploy machine learning. As expected, more organizations are reporting either pilots or production use of machine learning -- a full 85% in the new survey, compared to a survey conducted in 2018 in which only 58% of surveyed firms reported production or pilot projects. Credit scoring and decisioning is the most prominent application, but credit monitoring and early warning signals has increased, up from 13% in 2018 to 57% in 2018. The surveys for both 2018 and 2019 included 60 financial institutions in each year.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.40)
Data Challenges Are Halting AI Projects, IBM Executive Says
"And so you run out of patience along the way, because you spend your first year just collecting and cleansing the data," said Mr. Krishna, who was interviewed at The Wall Street Journal's Future of Everything Festival last week. "And you say: 'Hey, wait a moment, where's the AI? Mr. Krishna didn't name clients or say how many had halted projects. One well known example of an AI project unraveling happened in 2017 at the University of Texas' MD Anderson Cancer Center, which aimed to use IBM's AI platform, Watson, to improve cancer care. An audit by the University of Texas showed the cancer center was using old data, among other issues. A report this month by Forrester Research Inc. found that data quality is among the biggest AI project challenges. Forrester analyst Michele Goetz said companies pursuing such projects generally lack an expert understanding of what data is needed for machine-learning models and struggle with preparing data in a way that's beneficial to those systems. She said producing high-quality data involves more than just reformatting or correcting errors: Data needs to be labeled to be able to provide an explanation when questions are raised about the decisions machines make. While AI failures aren't much talked about, Ms. Goetz said companies should be prepared for them and use them as teachable moments. "Rather than looking at it as a failure, be mindful about, 'What did you learn from this?'" she said. Mr. Krishna said he couldn't specify what percentage of IBM-related AI projects were halted over the past five years. But he said: "In the world of IT in general, about 50% of projects run either late, over budget or get halted.
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.81)